f*\ 



MASSACHUSETTS INSTITUTE of technology 
ARTIFICIAL INTELLIGENCE LABORATORY 

A.I. Memo No. 675 Mav, 1982 



Zero-crossings and Spatiotemporal Interpolation in Vision: 
aliasing and electrical coupling between sensors 

T. Poggio, ILK. Nishihara & K.R.K. Nielsen 

Abstract: Wc will briefly outline a computational theory of tlic first stages of human vision according 
to which (a) the retinal image is filtered by a set of centre-surround receptive fields (of about 5 
different spatial sizes) which arc approximately bandpass in spatial frequency and (b) zero-crossings 
arc detected independently in the output of each of these channels. Zero-crossings in each channel 
arc then a set of discrete symbols which may be used for later processing such as contour extraction 
and stercopsis. A formulation of Logan's zero-crossing results is proved for the case of Fourier poly- 
nomials and an extension of Logan's theorem to 2-dimcnsional functions is also proved. Within this 
^*s framework, wc shall describe an experimental and theoretical approach (developed by one of us with 

M. Fahle) to the problem of visual acuity and hypcracuity of human vision. The positional accuracy 
achieved, for instance, in reading a vernier is astonishingly high, corresponding to a fraction of the 
spacing between adjacent photoreceptors in the fovea. Stroboscopic presentation of a moving object 
can be interpolated by our visual system into the perception of continuous motion; and this "spatio- 
temporal" interpolation also can be very accurate. It is suggested that the known spatiotemporal 
properties of the channels envisaged by the tlicory of visual processing outlined above implement an 
interpolation scheme which can explain human vernier acuity for moving targets. 

We consider, in particular, the problem of avoiding aliasing in the pcrifovcal visual field. It is 
conjectured that, gap junctions (or anodier form of coupling) between rods and cones are needed to 
avoid aliasing outside the fovea. Possible implications for machine vision and imaging devices are 
briefly discussed. 
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PNN 2 SPATIOTEMPORAL INTERPOLATION 

In the last seven years a new computational approach has led to promising advances in the under- 
standing of visual perception. This approach, which may be relevant not only for the information 
sciences but also for the neurosciences, is mainly due to the late D. Marr and his colleagues. In this 
article we will briefly describe tliis computational theory for the very first stages of vision, since it 
provides an useful framework for approaching the problem of spatiotcmporal acuity in human vision, 
which is the main topic of the paper. 1 



1 .1 A Computational Approach 

The central tenet of tliis approach is that vision is primarily a complex information processing task, 
with the goal of capturing and representing the various aspects of the world that are of use to us. It is a 
feature of such tasks, arising from the fact that the information processed in a machine is only loosely 
constrained by the physical properties of the machine, that they must be understood at different, 
though interrelated, levels. This framework, formulated by Marr & Poggio (1976), was not new: H. 
Simon and especially L. Harmon emphasized a similar point of view in a more general context. 

In a process like vision it is useful to distinguish three levels over which one's descriptions and 
explanations of the process must range: a) computational theory, b) algorithm, c) implementation. 
These are not hard and fast divisions. The important point is that no explanation or set of explana- 
tions is complete unless it covers this range. To avoid possible misunderstandings, we wish to stress 
that this computational approach is not a substitute for the "traditional" methods and techniques 
of the neurosciences to which it is in fact complementary. It is probably fair to say that most 
physiologists and students of psychophysics have often approached a specific problem in visual per- 
ception with their personal "computational" prejudices about the goal of the system and why it does 
what it does. With few exceptions this heuristic attitude, although useful, remained at the level of 
prejudices; computational analysis was not a science, nor was it appreciated in the neurosciences that 

one was needed. 

'Some of the material for this paper has been drawn from Poggio (1981) and Fahle and Poggio (1981). 
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This state of affairs is hardly surprising. The difficulties of the vision process are often not ap- 
preciated even now. Until the early 70's the field of computer science and artificial intelligence failed 
to realise that problems in vision are difficult. The reason, of course, is that we arc extremely good at 
it, but in a way which cannot be subjected to careful introspection. Today we know that the problems 
are profound. "Ad hoc" methods and tricks have consistently failed. Marr realized what the message 
was. A science of visual information processing was needed to analyze a given information processing 
task and its basis in the physical world. Marr's work, from the breadth of die approach to its rigorous 
detail in the analysis of specific problems, provides a methodological lesson for this new field. 



1.2 The Detection of Intensity Changes 

^*\ In this section we will outline one of the very first stages in the processing of visual infonnation, the 

computation of zero-crossings. The basic ideas, outlined by Marr in a paper (1976), have evolved 
into a scheme (Marr & Poggio, 1977) based on bandpass filtering of the image through difference 
of gaussians and detection of the associated zero-crossings. Marr and Hildrcth (1980) have provided 
a number of attractive arguments for justifying this scheme from a computational point of view, 
although a complete formal theory is still lacking. We will outline here their main points. 

The goal of the first step of vision is to detect changes in the reflectance of the physical surfaces 
around the viewer or in the surface orientation and distance. On various computational grounds, 
sharp changes in the image intensity turn out to be the best indicator of most physical changes in the 
surface. In natural images, intensity changes can and do occur over a wide range of spatial scales. It 
follows that their optimal detection requires the use of operators (that is filters) of different sizes. A 
sudden intensity change like an edge gives rise to a maximum or a minimum in the first derivative 
of image intensites or equivalently to a zero-crossing in the second derivative. Marr and Hildreth 
jT*\ (1980) argue that the desired filter should take the second derivative of the image at a particular scale. 

A convenient choice for die derivative in two dimensions is the Laplacian V 2 = ^ -|~ ^, and 
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Figure 1. A cross-section of the circularly symmetric centre-surround receptive field 
V 2 G. 

the appropriate scale can be set by filtering the image with a 2-D Gaussian filter G, which optimally 
satisfies specific constraints on the real world, particularly the fact that intensity changes arising from 
physical objects are spatially localized at their own scale. Since the operations of taking the derivative 
and blurring an image are linear, the overall transformation is equivalent to convolving the image 
with the Laplacian of a gaussian distribution, that is with V 2 G. As shown by fig.l, this corresponds to 
a centre-surround type of receptive field. Such a filter closely resembles the usual descriptions of the 
ganglion cell receptive field and of the psychophysical channels in human vision as the difference of 
two gaussians, an excitatory and an inhibitory one. Spatial filters with die centre-surround organiza- 
tion shown in fig. 1, are of course bandpass in spatial frequency, although their bandwidth is not very 
narrow. 
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In summary, die process of finding intensity changes at a given scale consists of filtering the image 
with a centre-surround type of receptive field, with a size reflecting die scale at which the changes 
have to be detected, and then locating the zero-crossings in die filtered image (see fig.2). 

To detect changes at all scales, it is necessary only to add other channels, of different dimension, 
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Figure 2. The image (a) has been convolved with a centre-surround receplive field with the shape illustrated in Fig. 
1. (b) shows the convolved image: positive values are shown white and negative black; white (black) values would then 
represent the activity of the corresponding on-(off-) centre ganglion cells "looking" at the image, (c) the zero-crossings 
profile contains rich information about the filtered image (b) as explained in the text. Similar independent filters of 
smaller and larger sizes are needed to capture the whole information contained in (a). From Marr and Hildreth (1980). 

and carry out the same computation for each channel independently. 



Zero-crossings in each channel thus form a set of discrete symbols which are used for later process- 
ing such as stereopsis (Marr & Poggio, 1977). Marr and Hildreth, in particular, addressed the problem 
of how to combine zero-crossings from different channels into primitive edge elements taking ad- 
vantage of physical constraints obeyed by the visual world. These and other symbolic descriptors 
then represent what Marr called the "raw primal sketch". Instead of describing these parts of the 
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theory, we shall discuss in more detail the zero-crossing detection process and the corresponding 
physiological and psychophysical evidence. Zero-crossings in the output of centre-surround channels 
represent a natural way of obtaining a discrete, symbolic representation of the image from the original 
"continuous" intensity values. Some recent deep results in complex analysis by B.Logan (1977) seem 
to support this scheme in a way which we found intriguing and fascinating from when we came across 
his remarkable paper. His main theorem ( see Appendix la) states that a bandpass one dimensional 
signal with a bandwidth of less than 1 octave can be reconstructed completely up to a constant multi- 
plication factor from its zero-crossings alone (if some relatively weak conditions are satisfied). From 
die point of view of visual information processing there is clearly no need to reconstruct the original 
signal. But the theorem suggests that the "discrete" symbols provided by zero-crossings arc very rich 
in information about the original image. Unfortunately, more definite claims are as yet impossible, 
since an extension of the theorem to images (Appendix la and especially lb; see also Marr ct al., 
1979) does not characterize completely die two-dimensional problem. In addition, centre-surround 
receptive fields arc not ideal bandpass filters, as required by Logan's version of the theorem (see 
Appendices la, lb). Clearly zero-crossings alone do not contain all the information (such as absolute 
intensity values), but as one of us has found in an empirical invesdgation, natural images filtered with 
V 2 G operators can be reconstructed to a good approximation from Uieir zero-crossings and slopes. 
A successful extension of the Logan type of analysis to two-dimensional patterns may therefore repre- 
sent one of the critical steps for perfecting this computational analysis of low level vision into a solid 
theory. 



1.3 The Line Detectors/Fourier Analysis Controversy: A New Synthesis? 

The previous ideas based on Logan's type of results not only lead to a satisfactory scheme for 

^*"*S the analysis of intensity changes in an image; they also have fascinating implications for visual 

psychophysics and physiology, since they seem to account for basic properties of die first part of the 
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visual pathway. In particular these ideas explain why the image is filtered early on by approximately 
bandpass centre-surround receptive fields; they make more precise the notion of "edge-detectors" 
for extracting a symbolic description which contains full information about the image; and they state 
that this can be achieved only if the image was previously filtered with several independent bandpass 
channels — i.e. centre-surround receptive fields. As an immediate consequence these ideas also 
provide a solution of the long-standing controversy about cdgc-dctectors versus frequency channels 
in the psychophysics and physiology of primate vision. The first stage of vision would indeed be per- 
formed to a good extent by "edge" detectors — actually zero-crossing detectors — and certainly not 
by Fourier analyzers; but in order for the zero-crossing detectors to extract meaningful information 
it is necessary that they operate on the output of independent channels, roughly bandpass in spatial 
frequency. 

jp*\ Many results from die psychophysics and physiology of early vision can be easily interpreted in this 

new framework. It is, for instance, not too unreasonable to propose that the V 2 G filtering stage is 
performed by ganglion cells of the retina and I GN, whereas a subclass of simple cells may represent 
oriented zero-crossing segments. In this context it is not important how this is implemented in detail: 
one of the several possibilities is tiiat simple cells may read die zero-crossings profile from the fine 
grid of small cells in layer 4C of the striate cortex, where a reconstruction of the filtered image, at 
different scales, may be performed (via intracortical inhibition) with the goal of providing a very 
accurate position of the zero-crossings (see later). 

Several gaps have still to be filled in the computational theory of zero-crossings. For instance, 
since zero-crossings do not represent the complete information about the image, it is important to 
characterize the other primitives that are needed. At the other levels of explanation experimental 
evidence in favour or against zero-crossings is of course highly desirable. Since the summer day in 
Tubingen where D. Marr with one of us first formulated the idea of zero-crossings in the output of 
f*\ independent, roughly bandpass filters, we cannot help feeling that its experimental validation — or 

falsification — is of critical importance for further developments of our approach to low-level vision. 
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2. Visual infonnation processing: why spatiotcmporal interpolation? 

Any visual processor with human-level performance must be capable of analyzing time-varying im- 
agery. The analysis starts with the spatio-temporal interpolation of the raw visual input. The spatial 
resolution of the photosensitive image available for processing is limited by the sampling density 
of the photosensitive elements in the sensor and by noise. Image motion introduces the additional 
problem of temporal resolution. The limiting factors are the frame rate and the integration time deter- 
mined by the sensitivity of the photosensitive elements. This is of little consequence for a stationary 
scene, but for moving targets it poses the problem of motion smear. 

The problem of high spatiotcmporal resolution can be partially overcome by using better sensors 
with larger arrays and higher frame rate. There are, however, technological and physical limits to 
the spatiotcmporal resolution that can be achieved in this manner, since increasing die spatial and 
temporal sampling rate reduces the number of photons per sensor element per cycle. Consider diat 
since the number x of photons is Poisson distributed, a = ^-. The number of distinguishable levels 
was estimated by Barlow (1981) to be roughly n = 2y/x. Thus 8 bits of resolution (n = 256) requires 
about 2 14 = 10 5 photons. Note that the light intensity of a bright surface is I0 4 cd/m 2 and tliis 
means 10 4 photons per 50 msec per sensor, assuming a sensor efficiency similar to the human cones! 

Fortunately, the performance of a given sensor can be improved by appropriate spatiotemporal 
interpolation schemes. As we have seen, using such processes the human visual system achieves an 
extremely high spatiotemporal resolution compared to the sampling density of the photoreceptors and 
dieir integration time. 

In summary then, temporal acuity, spatial acuity and motion smear are different facets of the same 
general problem posed to a visual processor by time varying imagery. We turn now to examine how 
the human visual processor deals with it. 



2.1 Visual acuity in human vision 
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Since the first measurements of vernier acuity in 1892 by Wuelfing in Tubingen, the extraordinary 
accuracy with which the human eye can estimate the relative positions of lines or other features in 
the visual field has represented a long-standing puzzle in vision research. Acuity of this type, also 
called hypcracuity, can be measured in a variety of situations. A typical example is the acuity found 
in reading a vernier (see inset of fig. 8a). This can be as fine as 5" of arc (Westhcimcr and McKee, 
1975), that is 0.02mm at 1 metre distance. The astonishing precision of this performance can be seen 
when the optical properties of the human eye are considered. In the fovea the hexagonal grid of cones 
samples the visual image with a sampling interval of no less than 25", well matched to the optical 
point spread function of the eye (its gaussian core has a half width of about 45", corresponding to a 
spatial frequency of 60 cycles/degree). 



Most remarkably of all, vernier acuity is not affected by movement at constant velocity of the 
target in a velocity range from (f/sec to at least 4° /sec (Westhcimcr & McKee, 1975). This means 
that a subject can detect the relative position of two lines to within a fraction of a receptor diameter 
(and spacing) while the whole pattern is moving across 70 receptors in 150 msec. Recently, evidence 
has been accumulating which suggests that the visual system is able to perform a very precise tem- 
poral interpolation as well, by reconstructing the spatial pattern of activity at moments intermediate 
between discrete temporal presentations (Barlow, 1979). The most telling demonstration, apart from 
cinematography, was introduced by D. Burr (1979a, see also Morgan, 1980) and is shown in the top 
inset of fig. 8c. Vernier line segments are displayed stroboscopically at a series of stations to portray 
a moving vernier; an illusory displacement occurs if the line segments are accurately aligned in space 
but are displayed with a few milliseconds delay in one sequence relative to the other. Not only do the 
segments appear to move smoothly from one station to the next but also, between the strobes, they 
are seen to occupy positions between those where they are actually exposed. The accuracy of detecting 
the equivalent displacement is again in the vernier acuity range, provided that the target moves at 
f\ constant speed and elicits a clear sensation of motion. One is forced to conclude that not only spatial 

but also temporal interpolation is performed in the visual system to preserve acuity (and resolution) 
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for objects in motion (sec Barlow, 1979). 

It is clear that the attainment of such spatiotemporal accuracy docs not break any physical law (see 
Wcstheimer,1976). As pointed out by Barlow (1979) and by Crick et al. (1980), the classical sampling 
theorem allows a correct reconstruction of the visual input from a set of discrete samples in space 
and time since the LGN signal is bandlimitcd in temporal and spatial frequency by die photoreceptor 
kinetics and the eye's optics respectively. In particular, Crick et al. have suggested (similarly to 
Barlow) that the fine grid of granule cells in layer IVc of the striate cortex performs an interpolation 
on the output of the LGN fibres, with die goal of representing the position of zero-crossings (the 
boundaries between activity in an ON and OFF ganglion cell layer) with a very high accuracy (see 
also Marr and Hildreth, 1980 and Marr et al., 1979). 

Although spatiotemporal interpolation can be well understood in terms of information theory, 
j**^ the astonishing performance of the visual system seems to require an algoridim and corresponding 

mechanisms of great ingenuity and precision. As we hinted earlier, an understanding of visual inter- 
polation may also be quite interesting from a purely information processing point of view. High 
resolution, smear-free real time imagery could benefit significantly from this study of human vision. 
Here we investigate some properties of this spatiotemporal interpolation. In particular, we examine its 
performance for a range of "sampling intervals" in space and time. 



2.2 Methods 

The vernier target used in these experiments consisted of a thin vertical bar made up of two segments. 
The stimuli were generated on a Tektronix 604 display under the control of analog electronics . Each 
bar was intensified for 0.1 msec at At msec intervals at n successive stations horizontally displaced by 
a separation Ax. Each of the two segments making up the bar was 24' high and 1.5' wide intensified 
f*^. to a luminance of about 50 times detection threshold on a background of 10cd/m 2 . During an 

experimental run, a target was presented every 3 seconds. Brief displays of n ■ At = 150 msec, 
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Figure 3. Vernier resolution threshold of spatial offset for different separations A:r between the stations as a function of velocity. 
Fig. 3a shows the data front subject AK, fig. 3b from subject TV. The standard deviation of the data is about 25%of the threshold 
value for fig. la and 20%for fig. lb. In fig.3a the point for Ax = ]' and v = JlP/scc was measured masking (lie beginning and 
the ending of the trajectory; die same procedure 'did not change the threshold for the point at v = IX? f sec. Of the two points 
at Ax — 2.5' and v = 25°/.sY;r in fig. 3a, the worse value has been measured under the "masking" condition way whereas the 
better one was measured in the standard way. In fig. 3b also the point at Ax = 2.5' and v — 25°/k&~ was measured with zero 
offset at die first and last station (from Fahle. and Poggio, 1981). 

with randomized direction of motion (terminating at the central fixation point)wcre used to prevent 
effective pursuit eye movements (Westhcimcr, 1954). The experiments measured 

a) the acuity for detection of real vernier offsets of the two segments by 6x seconds of arc 

b) the acuity for detection of apparent vernier offsets produced by delaying the presentation of the 
lower or upper segment, displayed at die same sequence of stations, by St msec 

c) the acuity for detection of mixed vernier offsets produced by a real spatial offset 6x together with 
a temporal delay St of opposite sign. 

In a forced choice task the subject was required to signal whether the bottom segment was dis- 
placed to the right or to the left of die top segment by setting a binary switch. Acuity was determined 
by the standard criterion of 75% correct identification. In all experiments reported here T is constant 
(T = 150 msec) and, as a consequence, the number of stations n is variable (n = 2 to 95). More details 
about the methods are given in Fahle and Poggio (1981). 
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Figure 4. Vernier resolution thresholds of temporal onset for different separations between the stations as a function of velocity. 
Fig. 4a shows the data from subject AK, fig. 4b from subject TV. The standard deviation is about 20%of the threshold values for 
subject AK and 18%for subject TV (from Fahlc and Poggio, 1981). 

2.3 The Spatial Type of Acuity: Dependence on Velocity (v) and Separation (Ax) 

The results for spatial offsets (with simultaneous presentation of the two segments at each station) are 
shown in figs. 3a,b. The main result is that spatial acuity is relatively independent of the separation 
between the stations and of the velocity of the target up to rather large velocities. These data confirm 
and extend Wcstheimer's and McKce's results (1975), which showed that vernier acuity is unaffected 
by rate of movement from 0°/sec up to 4° /sec. Our results imply that this type of vernier acuity is 
relatively independent of At, the strobe interval. 



2.4 The Temporal Type of Acuity: Dependence on v and Ax 

Figs. 4a,b shows the results for temporal offsets. The accuracy of detecting the equivalent displace- 
ment is in the classical vernier acuity range (compare Burr, 1979a.b): the best value for observer AK 
was 8" for spatial and 5" for temporal offset at comparable separations and velocities. Our main new 
result is that although acuity does not break down for large separations between the stations, at least 
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Figure 5. Fig. 5a shows the best vernier resolution threshold (with temporal offset) for each separation Ax 
. The daia arc from three subjects (partly from fig. 4a and 4b). O AK: O TV; X IIW. In fig. 5b the velocity 
v for which optimal vernier resolution is found is plotted against the separation Ax. Same data as in fig. 5a. 
Mom Fahle and Poggio (1981). 

up to half a degree, it deteriorates significantly almost in proportion to Az(see fig. 5). 

Vernier acuity of this temporal type is bad at low and high speed. As already clearly demonstrated 
by Burr (1979a,b) apparent motion is necessary for temporal offsets to be seen as spatial offsets. In 
our experiments, deterioration of acuity at low velocities could be due to the speed per se as well as to 
the lower number of stations (because our total presentation time is constrained to T = 150 msec the 
stimulus consisted, at the lowest velocities, of two stations). In any case, deterioration of acuity at low 
velocities can be linked with a decreased sensation of motion. 

A second important result is "that. the range of velocities for which temporal interpolation is good 
shifts upwards for larger separations between the stations. The fact that at higher separations higher 
velocities are required for good resolution suggests that a more revealing parameter is the time inter- 
val At between the strobes. In fact, at any separation Ax, temporal interpolation is optimal for a 
temporal interval A^ between 20 msec and 50 msec. 
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2.5 The Effect of Blur on Spatial and Temporal Acuity 

Standard vernier acuity is known to be affected, as one would expect, by attenuation of die high 
spatial frequencies of the vernier pattern (see for instance Stigmar, 1971). Is temporal interpolation 
also degraded in die same way? 

We have performed some experiments to answer this question by placing a ground glass screen at 
1 cm in front of the display. When a sharp line is viewed through such a ground glass screen the 
resulting light distribution has an approximately Gaussian line spread function with a width at half- 
height of at least 15', corresponding to a cutoff frequency of around 3-4 cycle /dcg. Our data show 
that in the experimental situation of fig. 4, blur of the pattern improves acuity at large separations and 
velocities. Fig. 6 compares directly for die same observer and for the same separation the effect of 
blur on spatial and temporal interpolation. Wcsthcimcr's type of acuity is degraded by blur, whereas 
Burr's type of acuity improves dramatically with blur (at high velocities). Out of five observers only in 
one case did blur of the pattern cause a reduction in temporal vernier acuity at high separations and 
velocities. 

These data again show that temporal hyperacuity has different characteristics from spatial hyper- 
acuity. 
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2.6. Spatial vs. Temporal Offset 

The apparent offset 6x l produced by temporal delay St should follow the ideal relationship 6x l = 
vSt. As shown by our data the sign of the offset is indeed correctly detected. Does its size also satisfy 
this relation? How faithful, in other words, is temporal interpolation? To answer this question we 
measured the temporal delay St needed to compensate for a given real spadal offset Sx for different 
condidons. 

Fig. 7 shows tfiat for a separadon Ax — 2.5' and a velocity v = 1.1° /sec the apparent offset 
Sx 1 = vSt matches rather closely the real spatial offset Sx. Under these conditions spatiotemporal 
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Figure 6. Tine effect of blur on spatial and temporal interpolation as a function of 
velocity for a separation between the station A* = 15'. Vernier resolution of a spatial 
offset is measured with (•) and without blur (o). Vernier resolution of a temporal offset 
is also shown with ( ) and without ( ) blur. The screen was blurred as described in the 
text. Notice that the fust point for spatial offset is for v = CP/sw. The observer is TV. 
The standard deviation is about 20%of the threshold values. From Fahlc and Poggio 
(1981). 

interpolation is indeed rather precise (compare Burr and Ross, 1979). It is not so for higher velocities 
and/or larger separations (fig. 5). The temporal offset needed to compensate for a real spatial offset is 
then much larger. 



3.1. Spatiotemporal Interpolation: How is it Done? 

The previous results constrain the problem of hyperacuity tightly enough to justify a theoretical 

analysis of how spatiotemporal interpolation may be done in the visual system. The precise meaning 

of interpolation in terms of our visual stimuli is a well denned question, and this is the main point to 

discuss. 

3.1.1. A Simple Illustration 

Fig. 8 illustrates a very simple scheme for achieving spatiotemporal interpolation of a visual pattern. 
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Figure 7. Temporal <6x') vs. spatial {6x) offset in the compensation experiment The 
ordinate shows the temporal offset (in equivalent spatial units 6x' — v ■ St needed to 
compensate the spatial offset shown in the abscisa. • is for a separation between the 
station Ax = 2.5' and a velocity v = .1.1 l°/W(Af = 37msec). X is for Ax = 2.5' 
and v = 5.2tf'/sc:c(At = TSmavc). O is for Ax = 7.5' and r = 4.11%ee(At = 
30?MSf:c). larger separations yield an even greater mismatch. The continuous diagonal 
indicates the loci of perfect compensation. Subject TV. From Fahle and Poggio (1981). 

The elements of this scheme could be interpreted as cells with associated receptive fields and temporal 
impulse responses. Alternatively, Fig. 8 represents a computational scheme for spatiotemporal inter- 
polation. Visual input is sampled in space by an array of cells with a sampling density high enough to 
preserve the whole of the spatial information (in accordance with the sampling theorem). The input 
is then reconstituted in more detail on a finer grid of cells by convolving the sampled values with the 
function sine x. In effect each cell of the interpolation layer weights its inputs according to a centre 
surround receptive field. A variety of filters (i.e. "receptive fields") are capable of performing a correct 
interpolation, especially in two spatial dimensions (see Crick et al. 1980). 



If the input intensity distribution is presented at discrete instants in time, temporal interpolation 
can be achieved by suitable temporal low pass properties of each individual pathway. If the temporal 
interval between presentations is small enough the effect of the filter is to reconstruct the original 
continuous temporal input. Spatial interpolation can then operate at each instant of time (this scheme 
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Figure 8. (a) A simple scheme for spatiotemporal interpolation. The input pattern is sampled by an array 
of "cells". Spatial interpolation is accomplished on a finer interpolation grid of cells each one weighting the 
sampled values with a sine shaped receptive field (shown in the lower inset). Temporal interpolation is obtained 
by filtering with an appropriate low-pass or band-pass filter each of the input channels (its impulse response 
is shown in the upper inset). Thus a scries of discrete frames of a moving pattern can be interpolated (see 
Theorem 1 in Appendix 2) into a continuous temporal function in each of the channels. The spatial input 
distribution outlined here represents an intensity edge as seen by centre-surround ganglion cells, (b) The spatial 
interpolation process in Fourier space. Interpolation is equivalent lo filtering out. the side lobes originated by 
the sampling process. Temporal interpolation can be interpreted in a similar way. From Fahle and Poggio 
(1981). 

would of course operate succesfully for continuous movement of a pattern). 

Fig. 8b shows the Fourier interpretation of the spatial interpolation process (interpolation in time 
can be interpreted in a similar way). The effect of sampling is to replicate the original spectrum in an 
infinite number of side lobes. Spatial interpolation - i.e. reconstruction of the original function from 
its samples - is accomplished by filtering out all side lobes but die central one - which is the original 
spectrum. 



This model is probably the simplest conceivable scheme. In it, interpolation in space and time are 
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performed independently, since tlic temporal dependence of the input is not constrained in any way. 
We now consider the conditions under which this scheme can be effective. 

3.1.2 Remarks on Interpolation 

Before embarking on an analysis of various interpolation schemes, it is appropriate to make a few 
general points which arise from the discussion so far. 

First, the process of computing intermediate values from samples does not depend on the existence 
of a finer rctinotopic grid of "cells", where the results are represented. All filtering transformations 
indicated in Fig. 8 could be carried out at a radier symbolic level for only a few distinguished points. 
Thus, it is important to keep separate the problem of a process from the problem of representing its 
output. This paper is directly concerned only with the first issue. 

Second, the goal of the interpolation process may be far more modest dian a full reconstruction of 

(T*\ the input distribution. As suggested by Crick ct al. (1980), the aim of interpolating the ganglion cells' 

activity is to provide die position of die zero-crossings (where activity switches from the on centre 

to die off centre cells) with high accuracy. This can be achieved by using very simple interpolation 

functions such as a normal centre-surround receptive field (Marr et al., 1980). 

3.1.3 More Complex Interpolation Schemes are Required 

The scheme of Fig. 8 can provide a correct reconstruction of a spatiotemporal input sampled at 
intervals Ac (in space) and At (in time) only when die input function is bandlimited in spatial (by 
f x ) and temporal (by /£) frequencies in such a way diat Af < l/2f x and At < l/2/£ (theorem 1 in 
Appendix 2). The image which reaches the retina is indeed bandlimited in spatial frequencies to less 
than about 60 cycles per degree by die diffraction limited optics of the eye. Furthermore, a temporal 
cutoff is imposed at the level of the photoreceptors by their limited temporal resolution. The scheme 
of Fig. 8 can therefore correctly reconstruct an image sampled at intervals of less tiian 30" in space 
(for the 2-D case see Crick et al., 1980). Temporal samples of the photoreceptor activity could be 
^™ s . interpolated under similar conditions (though regular temporal sampling in our visual system is highly 

implausible). 
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Since die spacing of die photoreceptors is almost exactly matched to the eye's optics, interpolation 
in normal vision - when the image is a continuous function of time and space - can be accounted 
for by simple schemes like that of Fig. 8. In particular, such models could account for the vernier 
acuity measured with real continuous motion of the retinal image. When, however, motion of an 
object is simulated by presenting the image at discrete positions at separate instants, the conditions of 
theorem 1 are in general no longer satisfied. In our experiments we present to the eye an image which 
is already sampled eitiicr in time (Wcsthcimer type of stimulus) or space (Burr type of stimulus) or 
both. We enforce arbitrary sampling intervals Ax and At on the system before the bandlimiting 
operations of the eye's optics and of die receptor kinetics come into play. Under tiiesc conditions 
die input function g(x, t) is not ensured to be appropriately bandlimitcd before spatial or temporal 
sampling occurs. The scheme of Fig. 8 should for instance perform poorly when die input function 
is sampled in space at intervals Ax significantly coarser than the photoreceptor array. Burr's and our 
data, however, show Uiat under these conditions our visual system performs significantly better. We 
are clearly forced therefore to consider other types of interpolation schemes. 

3.2.1 The Spatiotcmporal Spectrum of a Moving Vernier 

Our analysis of alternative interpolation schemes begins with die description in frequency space of 
the physical stimuli corresponding to Westhcimer's and Burr's experimental situations. When a spatial 
pattern g(x) moves continuously at constant speed, the resulting spatiotcmporal distribution of excita- 
tion on the retina has a simple representation in the Fourier space of temporal (f t ) and spatial (f x ) 
frequencies. Its Fourier transform takes values only on the diagonal line shown in fig. 9a with a slope 
equal to die velocity (see Appendix 2). For each spatial frequency contained in the pattern, there is 
a unique temporal frequency corresponding to it. Curtailing die duration of motion (in our case to 
T = 150msec) spreads the Fourier transform over a large area of temporal and spatial frequencies, 
changing the narrow line into a wider area. The spread (along the f t axis) is die same for all our data. 
Thus die line supports shown in fig. 9 must be interpreted as being spread along f t as a sine function. 
For T — 150msec the width of die spread is about 14 Hz for the central lobe of the sine function and 
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28 Hz for the central lobe plus the first negative side lobe on both sides. The retinal stimulus elicited 
by continuous motion of a vernier at constant velocity can be described in this way (see Appendix 2). 
The upper and the lower segment have the same line support on the f x — f t plane. Their Fourier 
transforms differ at all frequencies only by a phase factor which mirrors the spatial offset. The correct 
detection of this information underlies positional acuity. 
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Figure 9. Legend on opposite page 
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Figure 9. Legend 

a) The support on the f x — ft plane of the Fourier spectrum associated with continuous motion 
of a vernier (see inset) at constant velocity — v. The slope of the line is v. -g{f x ,ft) equals g(f x ) 
on that line. Curtailing the duration of motion to T = 150 msec, spreads the line into a bar-like 
support, corresponding to a sine function, b) The support of the Fourier spectrum associated with 
Westhcimcr's type of experiment. The inset indicates that displaying the vernier stroboscopically at a 
sequence of imes with an interval St is equivalent to "looking" at the continuous motion of a vernier 
through a series of temporal "slits". This has the effect of replicating the spectrum of fig.7a along 
the /( axis in an infinite number of side lobes. The distance of the lobes on -ft is l/6t. The line 
encounters the f x axis at l/v ■ At — 1/Ai (if Ax = 1', the distance of die side lobes on f x is 
60 cycle/dcg). Notice that for any f x , each lobe supports die same complex Fourier spectrum g(f x ). 
c) The support of the Fourier spectrum associated with Burr's type of experiment. Displaying the 
line segments of a vernier in the same position but with a slight delay is equivalent to looking at the 
continuous motion of a vernier through the spatial window depicted in die inset (transparent slits in 
an otherwise opaque screen.) This corresponds to replicating the spectrum of fig.8a along the f x axis. 
The distance of die lobes is 1/Ax, where Ax is die interval between successive slits in die spatial 
window. At a given f x , the Fourier spectrum g(f x ) of different lobes is in general different, d) The 
support of the Fourier spectrum associated widi die compensation experiment is die same as in fig.8c. 
The different window corresponding to this stimulus (see inset) corresponds, however, to a different 
complex Fourier spectrum (see Appendix 2). From Fahle and Poggio (1981). 
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Fig. 9 summarizes the description of the two basic stimulus configurations used in this paper 
according to the derivation outlined by Fahlc & Poggio (1981) . Westhcimcr's experimental situation 
is equivalent to looking at the continuous motion of a vernier through a series of equidistant narrow 
temporal slits within which the pattern is briefly visible (see fig.9b). Burr's experimental situation 
ideally corresponds to a vernier moving behind a spatial window with a scries of equidistant narrow 
slits (see fig.7c). The spatial or temporal windows affect differently the spectrum of the retinal input. 
As indicated in fig. 9, in the Westlicimer situation the complex spatial spectrum of the pattern, 
which contains amplitude and phase information, is replicated an infinite number of times along the 
temporal frequency axis, whereas in die Burr case the same spectrum is replicated along the spatial 
frequency axis. An important observation is that in fig.9b (Wcsthcimcr stimulus) all lobes at any 
given f x support exactly the same complex spectrum g. This is not so in fig.7c (Burr stimulus), where, 
instead, all lobes have the same g at any given f t . We re-emphasize that fig. 9 describes die physical 
properties of the different stimuli without any reference to the human visual system. 

3.2.2 Computational Aspects of Intcrpolation:Thc Constant Velocity Assumption 

More effective interpolation schemes are feasible if general constraints about the nature of the visual 
input are incorporated directly in the computation. The key observation here is that die temporal 
dependence of die visual input is usually due to movement of rigid objects, and that in everyday life 
motion has a nearly constant velocity over the times and distances which are relevant to the interpola- 
tion process (T < 100msec and x < 1°). The constant velocity assumption leads to a more specific 
form of die sampling theorem, given in Appendix 2 (see also Crick et al., 1980), which states formally 
what is intuiUvely clear: the spatiotemporal sampling rate can become very low without losing infor- 
mation. Interpolation schemes based on the constant velocity assumption exploit the equivalence of 
the dme and space variable [x «=i vt). From the point of view of filtering this means that spatial 
and temporal interpolation cannot be performed independentiy as in the simple scheme of Fig. 8. in 
the Fourier domain the constant velocity assumption constrains the spectrum of the visual input to 
lie on the line support shown in Fig. 9a. In the ideal case of infinitely long motion the side lobes 
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generated by sampling cither in time (Fig. 9b) or space (Fig. 9c) can always be excluded by means 
of appropriate filters, if the precise value of v is known (e.g. by measurements). The recovery of 
the original spectrum (Fig. 9a) corresponds to an ideal interpolation for arbitrarily large sampling 
intervals (if v is known and different from zero). In the realistic case of finite duration of motion, finite 
sampling intervals are enforced by the spread of the Fourier spectrum into a larger area, but the same 
basic arguments still apply. 

3.2.3 Implementing the constant velocity scheme 

An interpolation scheme of this type could be implemented simply by measuring the exact velocity 
of movement and then reconstructing the spatiotcmporal trajectory of die pattern for either temporal 
or spatial information. Another, more attractive possibility is suggested by the idea, supported by 
much psychophysical evidence, that in the human visual system there exist several channels at each 
eccentricity , i.e. several sets of receptive fields tuned to different spatial sizes and with different 
temporal properties. We imagine, following Burr (1979b) that these channels have somewhat overlap- 
ping supports covering the region of the (f x — f t ) Fourier plane which corresponds to the sensitive 
range of the visual system. "Stasis" channels are tuned to high spatial frequencies (small receptive 
fields) and low temporal frequencies (sustained properties); "motion" channels are tuned to low spa- 
tial frequencies (large receptive fields) and high temporal frequencies (transient properties). Thus, 
each channel is tuned to a different range of velocities, centred on the ratio between the optimal 
temporal and spatial frequencies characteristic for die channel: stasis channels for instance are tuned 
to low velocities whereas motion channels are tuned to high velocities. Fig.lOb shows a set of ideal- 
ized "velocity channels" of this type. Since each channel has its own cutoff in temporal and spatial 
frequency, interpolation may be performed independently and with different characteristics within 
each channel. In the Burr type of experiment stasis channels could correctly interpolate only patterns 
displayed at small separations and low velocities, whereas motion channels could be effective (but not 
f"*^ so accurate) at large separations and high velocities by filtering out the side lobes arising from the 

coarse spatial sampling. The complementary argument applies for coarse time sampling. As indicated 
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in Fig. 10b the stasis channels may suffer from aliasing at values of Ax for which the motion channels 
interpolate correctly. We assume, then, that in this scheme the wrong channels are switched off by use 
of velocity information. 

Fig. 10c shows a more realistic interpolation scheme of the same basic type. Instead of many 
channels, each one sharply tuned to velocity and inactivated when the pattern docs not move at its 
characteristic velocity,. there are a few channels coarsely tuned to velocity and without any precise 
velocity sensitive inactivation, apart from directional selective properties. 

In the light of this analysis we turn now to a detailed discussion of our experiments. Our main 
question concerns of course which type of interpolation scheme is actually used by our visual system. 



f m \ : 4.1 Westhdmer's Acuity: Recovery of Spatial Offset 

a) In Fourier tenns, die aim of the interpolation process is to filter out the side lobes, preserving only 
the central lobe, as the latter represents the Fourier spectrum of a continuously moving bar. 

When both the time interval At between presentations and the velocity v are small, interlacing 
of the side lobes in die Fourier spectrum is negligible. Temporal low pass properties of the visual 
pathway, as in the model of fig. 10a, suffice for eliminating die side lobes and thus achieve a correct 
interpolation. When At is large, however, interlacing is considerable in the sense that, even for the 
scheme of fig.l0c, there are one or more channels which mix the main lobe with at least one of die 
side lobes. Because of the spread associated with the short duration of the motion sequence, actual 
overlap between the lobes can be significant. It turns out, however, that this does not represent a 
problem from the point of view of the spatial acuity measured in our experiments. At each f x the 
complex Fourier spectrum on all side lobes is exactly the same. Thus, the spatial spectrum is correct 
irrespectively of the temporal frequency and independendy of the number of side lobes contained 
rS in die support of the interpolation filters. At large Ax and high v, die presence of die side lobes 

turns out to be even beneficial for vernier acuity; under these conditions high frequency channels, 
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Figure 10. (a) The support on the- Fourier plane of spatial and temporal frequencies of an interpolation 
filter corresponding to a scheme such as Fig.6. (b) 'Die support on the Fourier plan or a set of spatiotcmporal 
filters ideally tuned to dilTcrcnt velocities. A large number is needed to cover all velocities of interest. The 
filters are assumed to be direction selective, since they only operate in the Fourier quadrants corresponding 
to positive v = ft/fx in g{x + vt). A spatial pattern moving at constant velocity and sampled at spatial 
intervals 6x has on this plane the support shown by fig. 9c. To avoid aliasing, the low velocity filters can 
be "switched off' by information about the velocity of the motion, (c) A more realistic set of filters, broadly 
tuned to different velocities. The stasis channel is tuned to low temporal and high spatial frequencies and 
thus to low velocities. The motion channel is tuned to high temporal and low spatial frequencies and thus 
to high velocities. Intermediate channels (not shown here) may also be present The hatched areas represent 
the support of such directional filters. Nondircctional filters would have also a symmetric support in the other 
two quadrants. From Fahle and Poggio (1981). 

which would not be stimulated by continuous motion, can obtain the correct spatial information from 
the side lobes, which are an artefact of the discrete time presentations. On the whole, and in the 
absence of a sophisticated interpolation process that always excludes all side lobes (such as the scheme 
of fig. 10b), one expects vernier acuity to be rather invariant for a wide range of separations and 
velocities. Our data conform well to these expectations. Notice that the presence of side lobes at high 
velocities and large separations corresponds to the perception not of a moving bar but of a briefly 
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illuminated stationary grating - which carries however the correct spatial information. In this sense 
at large Ax and high v interpolation fails to retrieve the "correct" spatiotemporal pattern, but still 
preserves spatial acuity (even at extremely high speeds). 

b) The qualitative interpretation of our data in usual space-time variables is straightforward. Spatial 
interpolation, for instance by appropriate receptive fields, takes place correctly for each frame (i.e. for 
each station ) even when temporal interpolation fails. Since our forced choice task measures only spa- 
tial acuity, performance is in this case independent of the interpolation of die temporal dependence of 
the visual input. 

c) These results suggest that spatiotemporal interpolation is not performed by the "ideal" interpola- 
tion scheme of Fig. 10b. For temporal aspects should tiien be retrieved correctly at all At, while 
acuity for high velocities should be exactly as bad as for continuous motion. The one channel scheme 

^\ of Fig. 10a could explain these data on posidonal acuity; but as pointed out by Burr (1979b, 1980) the 

image should then be inevitably smeared at all but very low velocities. 

4.2 Burr's Acuity: Interpolation of Temporal. Offset 

a) In Burr's experiment die situation is quite different. For any given f x the side lobes contain 
different parts of the original spectrum. Thus when more side lobes lie in the support of the same 
channel (in fig.lOa or fig.lOc) there is a mixture of spatial frequencies, detrimental to acuity. One 
understands, therefore, that acuity deteriorates considerably (see fig. 2) with increasing overlap among 
the side lobes (large separations between the stadons). At any given (large) separation, low velocities 
bring about a considerable overlap between die side lobes. Higher velocities reduce die degree of 
overlap at the expense of high spatial frequency informadon, which is filtered out by the temporal 
cutoff(s) of die visual pathway (between 20 and 50 Hz, see for instance Kelly, 1979). Thus one 
expects to find for each separation Ax, an optimal velocity at which the side lobes just avoid overlap. 
Assuming a spread of pbs 15Hz die optimal velocity (in degree/sec) should be v = 30 • Ax (Ax in 
F^ degrees), which is in rough agreement with the data of fig. 5b. When die velocity approaches zero the 

line supports in fig. 10c all tend to lie on the f x axis (notice that, because of the finite presentation time 
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T, the supports effectively overlap). Tn this situation information about the offset cannot be retrieved. 
In the limit of very high velocity the set of lobes approaches the line spectrum of a stationary grating 
with no offset. Notice that we assume for the scheme of fig 10c that the vernier threshold is higher 
when some of the channels signal zero offset while the others still "sec" the correct offset. 

b) When the temporal component of the filters fails to interpolate between temporal frames motion 
is perceived as discontinuous. As a consequence the spatial interpolation process correctly signals zero 
spatial offset for each frame. The critical strobe interval which yields optimal temporal interpolation 
is not very different between the channels (see Fig. 5a). Though its performance may worsen at 
high velocities, as for the continuous motion, it should be rather invariant with respect to Ax, the 
separation between the stations. Fig. 5a shows that this docs not happen. The opposite conclusion 
holds for the scheme of Fig. 10a. Its performance should deteriorate rapidly for separations Ax 
between the stations larger than the distance between photoreceptors, which is in conflict with Burr's 
and our data. An interpolation scheme of the type of Fig. 10c seems consistent with these results: 
while small, slow "receptive fields" would be unable to interpolate correctly at large separations (Ax 
large), fast receptive fields could perform a correct interpolation, if the velocity is appropriate. 

The fact that spatial acuity is extremely good at separations up to 2.5' suggests that the interpolation 
channels are direction selective. 

4.3 Effect of Blur 

a) The interpolation scheme outlined in fig.lOc makes a rather strong prediction about the effect 
of blur. In the Westheimer case blur can only degrade vernier acuity, since it eliminates the high 
frequency channels. Blur of the Burr stimulus, however, should improve acuity at least at large separa- 
tions and high velocities, since it eliminates side lobes which signal the absence of an offset . Our data 
are fully consistent with this expectation. A more perceptual but equivalent description of the effect 
of blur is this. At high velocites and large separations there is a strong sensation of a grating of thin, 
fS unbroken lines - corresponding to die side lobes seen by visual mechanisms tuned to low temporal 

and high spatial frequencies - and a weak impression of a single moving target with a clear offset 
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- corresponding to the main lobe seen by mechanisms tuned to lower spatial and higher temporal 
frequencies. This ambiguity is removed, as already noticed by Burr (1979), by the blur of the screen, 
which suppresses the high frequency grating. 

b) In other terms, blur eliminates the contribution of die small receptive fields which are unable 
to interpolate correctly at large separations and therefore signal zero offset. The large receptive fields, 
however, remain largely unaffected by blur. 

c) The effectiveness of blur in improving vernier acuity at large Ax shows that our visual system 
does not normally have the intrinsic possibility of switching off the wrong channels as assumed in the 
scheme of Fig. 10b. 

4.4 Spatial vs. Temporal Compensation 

a) This stimulus situation corresponds to looking at the continuous motion of a vernier through the 
spatial window shown in die inset of fig. 9d. The resulting Fourier support, is again as in fig. 9c: 
here, however, the main lobe signals no offset, corresponding to precise spatiotemporal compensa- 
tion, whereas the other lobes all signal the spatial offset between the upper and lower grating of 
the window. In odicr words, exact compensation between space and time is realized only in the 
main, correct lobe. Thus, the spatial offset should dominate as soon as the side lobes are "seen" 
by some of the channels of fig. 10c. This is increasingly so for larger separations Ax between the 
stations. Correspondingly, die perception of the stationary grating carrying spatial offset information 
(die broken slits in the window of fig. 9d) is expected to dominate at large separations and velocities. 
Again our data are consistent with these expectations. Even at relatively small separations between 
die stations (see fig. 7) the system does not achieve a perfect interpolation - that is, removal of all 
side lobes. Only in this case would die temporal offset exactly cancel die spatial offset. As expected, 
blur improves compensation, since it helps to remove the "wrong" side lobes, which carry information 
only about the spatial offset. 

b) This experiment combines Burr and Westheimer stimuli. Since spatial interpolation always 
retrieves the spatial offset, tiiis dominates for all cases in which the temporal component of interpola- 



r^' PNN 30 SPATIOTEMPORAL INTERPOLATION 

tion is not fully correct. 

5. Discussion 

To summarize, the psychophysical experiments reported here suggest that spatiotcmporal interpola- 
tion in the visual system, remarkable though it is, is far from being perfect and flawless. Ideal 
interpolation is equivalent to filtering out the side lobes in the Fourier spectrum arising from the 
discrete presentations. The task is easy at small separations but requires in principle complex filters for 
large separations (see Crick et al., 1980). As our data suggest, our visual systems do not seem to use 
a very sophisticated spatiotcmporal interpolation process. The side lobes are not effectively filtered 
out under all conditions. Spatiotcmporal interpolation, then, can be considered as a direct conse- 
quence of the spatial and temporal properties of early vision, in terms of an interpolation scheme of 
the type of fig.lOc. The existence of independent channels tuned to different spatial and temporal 
l/ /0>\ frequencies seems to account for the spatiotcmporal interpolation revealed by our experiments. A 

detailed theoretical analysis with the help of appropriate computer experiments is necessary for a 
quantitative evaluation of interpolation models of this type. 

5.1 Explicit or implicit interpolation? 

Interpolation can be regarded as a spatiotcmporal filtering of the input transmitted from the retina. 
This is die point of view taken in this paper. We cannot advance any hypothesis as to where this 
filtering stage may be localized in the brain on the basis of our psychophysical data alone. Throughout 
this paper we have used die term "interpolation" without necessarily implying a direct reconstruction 
of the pattern of visual activity, say its zero-crossing profile in the various channels, somewhere in the 
visual pathway. Clearly, hyperacuity may simply rely on a specialized routine operating on a small 
region of the image to answer specific questions, like the right-left choice in a vernier task. Thus 
the interpolation scheme suggested by our data may be implemented as an "implicit interpolation", 
that is, as a computational process involving manipulation of symbolic quantities; or it may depend 
f*\ on an "explicit reconstruction" of a (coded) version of die array of photoreceptor activity on a fine 

retinotopic grid of neurons. These extreme possibilities - and all in between - can be implemented in a 
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variety of ways. For instance, activity may be reconstructed automatically on the fine topographic grid 
of layer IVc/J by an automatic, parallel process. 

On the other hand, a specific, more symbolic process could read the output of retinal ganglion cells 
and perform the correct interpolation for any desired position and time. In this case interpolation 
would be implicit and mixed with the decision process itself. 

In the first case, the decision routine (is the upper segment to die right or to die left?) would 
operate on an interpolated version of die image. Thus, "reprogramming" of die vernier routine may 
not be expected to affect the interpolation process but only the detection criteria, contrary to the 
second case, in which different detection strategies may influence interpolation. 

5.2 Are the Psychophysical Channels the Interpolation Filters? 

Our data support interpolation schemes of die type outlined in Fig. 10c. They say, however, neitiier 
how many independent channels are needed, nor what are exactly their spatiotemporal properties. 
Our results seem consistent with standard characterizations of their spatial and temporal properties 
(Campbell and Robson, 1968; Burr, 1979b; see also Marr ct al., 1980; Wilson and Gieze, 1977, Wilson 
and Bergen, 1979). 

These observations suggest the interesting idea that the spadal frequency tuned channels present 
in early human vision may be the interpolation filters themselves. To be completely explicit let us 
consider simple examples of how an interpolation scheme such as Fig. 10c might be implemented 
in the visual system. The first possibility is that the image is filtered before interpolation dnough 
various independent channels. Retinal or LGN ganglion cells of different sizes could represent the 
image filtered at different resolutions. Later in die visual pathway each of these representations would 
be independendy interpolated on a finer cortical grid of cells with a receptive field very similar to 
die corresponding LGN cells. Another possibility is that only two of the channels are present at the 
precordcal level (e.g. X and Y) and tiiat the measured psychophysical channels represent interpolation 
^""N filters operating on their X and Y input at die cortical level. In diis second case one would expect only 

two sizes of receptive fields - at each eccentricity - in the retina and LGN but a scatter of sizes in the 
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cortex (possibly in IVc). Thus the same retinal channel may be interpolated in two different ways, by 
small cortical receptive fields and by large ones, the first reconstructing the high frequency content 
of die retinal channel and the second emphasizing its coarser details. Notice that as a consequence 
cortical (interpolation) channels may have a narrower bandwidth than retinal ones. 

5.3 A prediction: interpolation must be direction selective 

An explicit interpolation scheme of this type consists of a set of motion channels with direction selec- 
tive properties, in the sense that die spatiotemporal interpolation filter thereby implemented must 
depend (in one dimension) on the sign of v (see appendix of Fahle and Poggio, 1981). As a conse- 
quence the interpolation channels should have some type of direction selective property; furthermore, 
cells of layer IVc -if they are involved at all - should show, despite their center-surround receptive 
field, some non-standard direction selective property. 



6. Interpolation in the perifovcal visual field: does aliasing occur? 

In die pcrifoveal retina, the spacing of the ganglion cells increases, as Barlow pointed out, whereas 
the optical cut-off remains approximately the same (for instance at Iff 3 eccentricity; see Weale, 1976). 
The grid of ganglion cells is, however, matched to die spatial cut-off of die signal thereby represented: 
in the cat, Peichl and Wassle (1979) have shown that receptive field diameter and ganglion cell separa- 
tion both increase towards the periphery so that sampling in the array of ganglion cells takes place at 
the interval appropriate to the cut-off frequency passed by the larger receptive fields. Thus, the grid of 
ganglion cells is likely to satisfy the sampling theorem (see Hughes, 1981). 

A more serious, and so far unsolved, problem is whether in die perifoveal visual field the signal 
represented by the ganglion cells suffers from aliasing, i.e., undersampling, at die level of the 
photoreceptors. If only cones are involved, aliasing seems unavoidable for eccentricities larger than 
about 5° — l(f . The classical sampling theorem requires that die signal is lowpass filtered before 
sampling in order to avoid overlap of the sidelobes in the Fourier spectrum (i.e., aliasing). Lowpass 
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filtering after sampling cannot always avoid aliasing. 

It is easy to show that ideal lowpass filtering after sampling eliminates overlap of the sidelobes 
only up to sampling intervals that are twice the limit set by the sampling theorem. 2 Preliminary com- 
puter experiments support these conclusions for the approximately lowpass filtering performed by a 
center-surround receptive field; in this case, however, effectiveness of lowpass filtering decreases more 
gradually witii increasing sampling intervals.. 

This scheme is somewhat supported by Poliak's data showing that visual acuity threshold increase 
with eccentricity more than the separation between cones. Convergence of cones on X ganglion cells 
is therefore likely to increase with eccentricity. 

If aliasing cannot be fully avoided, hyperacuity threshold must rise faster with eccentricity than 
visual resolution thresholds, a result which has been recently established by Westhcimcr(1982). If the 
lT*\ reason for this were indeed aliasing, blur of the vernier pattern should improve vernier acuity in the 

periphery, at least in the absence of noise. Blur of the pattern corresponds to lowpass filtering of the 
signal before sampling, as required by the sampling theorem. Preliminary experiments performed to 
test this prediction indicate, however, that blur may improve hyperacuity only slightly, if at all (Fahle 
and Poggio, 1981; Westhcimer, pers. comm.; Fahle, pers. comm.). 

A possible explanation for this small effect arises, if input from rods (in addition to cones) is also 
allowed. Aliasing in the periphery could then be largely avoided at all eccentricities by lowpass 
filtering the image before sampling, by pooling together inputs from all neighboring photoreceptors- 
rods and cones- via either gap junctions or synaptic coupling in second order neurons. If this predic- 
tion were correct, the decrease of vernier acuity with eccentricity would not depend on aliasing but 
would simply be a graded phenomenon due to the increasing spacing (in terms of visual angle) of 
the cortical grid and on a decreasing signal to noise ratio (because of the decreasing density of cells). 
The ineffectiveness of blur is consistent with this scheme. A critical test of this hypothesis may be 

* ■'" 2 This is achieved at the expense of a much more extensive loss of high spatial frequencies than in the case of lowpass 

filtering before sampling. Localization of an isolated feature like a zero-crossing is, however, rather unaffected by loss 
of high spatial frequencies, in the ideal case of small noise level. 
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obtained by measuring vernier acuity in the periphery under different conditions of light adaptation. 
An important corollary of this prediction is that the space constant of the electrical coupling should 
increase proportionally to cone spacing from the fovea to the periphery (the rod network may have 
interesting spatiotemporal properties (see Detwiler ct al., 1978), possibly useful for moving patterns). 
Several morphological studies have demonstrated apparent connections between cones as well as be- 
tween rods and cones in the vertebrate retina (see for instance Raviola and Gilula, 1975). Nelson 
(1977) has provided physiological evidence for the cat that cones have inputs from rods, probably 
mediated by the rod-cone gap junctions. The above conjecture would explain why coupling of this 
type is needed already at the level of the photoreceptors, whereas improvement of signal-to-noise 
ratio could be achieved in a simpler way with convergence of signals at a later level in the retina. 

6.1 Significance for information processing and machine vision 

There are various methods for rcconstmcting the original signal at high resolution by interpolating 
values measured at widely spaced intervals. The best known approach to this problem is based on 
the Shannon sampling theorem and on its various extensions. For static images interpolation of this 
type can provide a resolution much higher than the original sampling grid. Since in our framework 
the position of zero-crossings (and not the grey level values) is important, Hildreth and Poggio have 
examined the problem of interpolating the values of the V 2 G convolution in order to obtain precisely 
the location of zero-crossings. Analytical arguments, supported by computer experiments, have shown 
that the position of a zero-crossing can be interpolated precisely in terms of very simple interpolation 
functions, even by linear interpolation. For time- varying images the situation is more complicated. In 
the classical sampling theorem, interpolations in space and time are performed independently, since 
the temporal dependence of the input is not constrained in any way. Interpolation algorithms based 
on the constant velocity assumption discussed earlier could achieve higher spatio-temporal resolution 
for objects in motion, as long as the constant velocity assumption is not grossly incorrect, despite 
f^ low spatial and temporal sampling rates. Positional acuity for the image features, e.g., the zero- 

crossings, although desirable, is not the only goal of this spatiotemporal interpolation stage. A filter 



^" S > PNN 35 



/"•% 



f~\ 



SPATIOTEMPORAL INTERPOLATION 



that correctly interpolates the sampled image automatically avoids any defect in the representation of 
the image since it reconstructs the "original" input. It avoids in particular motion smear; and it "fills 
in" eventual gaps either in space or time, where or when the sampled input is missing. Real time 
vision machines may well need such an interpolation stage and it will be interesting to see die form 
and the performance of a computer implementation. In particular, the "gap junction" scheme for 
avoiding aliasing with sparse sampling intervals may be usefully implemented in future CCD devices. 
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Appendix la 

Logan's results apply to £oo(M functions, i.e., the restrictions to the real line of entire functions 
of exponential type X whose growth (on (R)) is less than exponential. In particular, they apply to pe- 
riodic functions with the exception of theorem 4 (Logan, 1977), which can be specialized to periodic 
functions (Logan, personal communication). If we restrict ourselves to trigonometric polynomials, it is 
possible to illustrate Logan's results in a simple way. It should be stressed, however, that trigonometric 
polynomials are a very special case and in general erroneous inferences can be made from their 
special properties. With this "caveat" in mind, let us consider the real band limited function 



N 



h{t)=Yj C " eint C n = C- n (1) 

—N 

/""% which can be extended to the complex plane as 



N 



—N 

h(z) is for instance bandpass with one octave bandwidth if 



%nz 



C n ~ \n\ < y 



The complex free zeros of h(z) are the complex zeros of h(z) in common with its Hilbert transform 
h(z) where 

N 

h{z) = JlCne ins C n = -i sign{n)C n (2) 

— N 



Let us define, given h(z) 
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N 

inz 



P{*) = I] C n e 

4+1 



-(A+l) 

N(*)= X) W"* (3) 

— N 



where /I is die low-frequency boundary of the spectrum ofh(z) (assumed in the following bandpass). 

Then the free zeros of h(z) arc completely characterized by the following three equivalent formula- 
tions: 

The free zeros of h(z) are such z*\ 

P( 2 *) = o N{z*) = (a) 



h(z*) = />(«*) = (b) 



P{z*) = P(Fj = (c) 

Observe that if z is a zero, I is also a zero of /i(z); and if 2 is a zero, z -f- 2/c7r k an integer, is also a 
zero. 

The coefficients C n of h(z) may be determined by the 2N roots of h(z) as the solutions of the 
system of 2N equations 



N 



J2 C n e inzi = 



—N 
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N 



^2c n e inz ^ = (4) 



-N 



Let us now rewrite 



h(z) = ^2c N e 



N 

inz 



-N 



as 



%)= E^vl ( 5 ) 



/"*S 



with 



£ = e i2 , to = C n _ Nl R[z] = [0, *], N = :2M 

Thus the nontrivial zeros of h(z) coincide with the zeros of ]T^ N g n $ n t that is, a polynomial of 
order 2N. If the 2N roots £ would be known, it would be possible to write 2N equations in the 
2N + 1 real unknowns (C n ): 



2 N 




2/V 



X>n£ 2 V=0 (6) 
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with 



i = e lz 



f\ 



Since the determinant of the roots is a Vandcrmonde determinant, it always has maximum rank if 
the roots are distinct. The question is under which conditions die real roots alone determine, apart 
from a multiplicative constant, the set of C n , i.e. h(z). Clearly, multiple zeros, in particular multiple 
real zeros, cannot be allowed. Observe that if more than 2N real zero-crossings would be available (in 
a basic period) then h = 0. 

Under the bandpass condition (C n = for n < A) tiiere arc at least 2A real zero-crossings per 
period. The real unknowns are 26, b = N — A, Uiat is the number of non-zero C n between N and 
A, counted twice because they are complex numbers. A sufficient condition to ensure that there are 
enough zero-crossings, and thus equations,' is A = M = fy, i.e., C n (for n > 0) all non-zero in [M, 
2M]. Notice that [M, 2M] i.e., one octave bandwidtii would not be sufficient: in tiiis case there would 
be at least 2M real roots but 2(M -\- 1) unknowns C n . The matrix associated to the homogeneous 
equation in the "roots" 

( e —i2Mti e — i(A+l)h e i(A+l)h e i2Mh\ 
, e —i2MhM J 



has rank at most 2M — 1 (since there exists C n such that ][]C n e tni vanishes identically for x = 
h- ■ Mm) and this would just not suffice to specify die C n modulus a multiplicative constant. 

Although the less-than-1 octave condition is sufficient to ensure enough zero crossings, it is by no 
means necessary. In fact, there are classes of bandpass signals with a larger bandwidtii and still enough 
^*\ zero-crossings. 

In any case, even when there is a sufficient number of zero-crossings, the question still remains 
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of whether the determinant of the matrix of the "roots" \e int3 \ has maximum rank (2M — 1) and 
therefore the C n can be determined (modulus a multiplicative constant). If the rank is less than 
2M — 1 then the C n arc not uniquely determined and as a consequence h(z) is not determined by its 
real roots. Logan (1977 and personal communication) has proved that 

a) if a free zero exists then h(z) is not uniquely determined by its real roots and 

b) if there are no free zeros, h(z), provided its bandwidth is appropriate, is determined, modulus a 
multiplicative constant, by its real zero-crossings. 

In the following, we will outline Logan's main theorems for the case of trigonometric polynomials. 

Theorem 1 

If h{z) has 1 or more free zeros, the rank r of the determinant of the roots is r < 1M — 1. 

Proof 

h(t) can be written as 

h(t) = P(t) + N(t) 

M—l M—l 

= e - i2Mt { £] 9ne int } + e^+^i £ P n e int } (8) 



M—l M—l 

= e~ i2m J] ( e « - e ibj ) + e <(W+D -f JJ ( e « - e i6j ) 
If e is a free zero of h(l) then we can divide h{t) by the real function 

*fi\ fit it\f it it\ m- "+' • ^ — 6. /r ,. li±j . t — 6, .. t — t . t — t 

f(t) = (e u — e")(e" — e") = (2te^r~ sin — — )(2te 2 sin —5—) = 4 sin — — - sin — — - 

(9) 



/"N 



with /4 real. 
The resulting jU is still a periodic bandpass function of the form 
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hfi\ ~ M 2M 

^=X>^+x>^ do) 

J \ ' —1M M 

and actually of reduced bandwidth. Multiplication of } j& by any arbitrary [a — cos(t — a)], o > 1 
which can be always written as Csin i - = ^ L sin ^p, provides a periodic bandpass function with the 
same bandwidth as the original h{t ) but different from it despite the same real zeros. Notice that if e is 
not a free zero, j^j will no longer be a periodic bandpass function. This means that (lie determinant 
associated with the homogeneous equation 7 has at most rank r = 2M — 2. 

Theorem 2 

If h(t) has no multiple and no free zeros the rank of the determinant of the real "roots" is r = 
f\ 2M — 1. 

Proof 

Clearly r cannot be r > 2M — 1. If h x and h 2 have the same bandwidth and the same real zeros, 
then 



2M-1 

hih 9 +h l h 2 = J2 9ne int (11) 



2M-I 

hfa-kfa = J^ Pj nl (12) 

o 

as it is easy to check by substitution of equation (2). If the real zeros are 2M in number and distinct, 
the Vandcrmonde determinant associated to the real roots of equation 12 is different from zero; thus, 
f*\ the unknowns g n are identically zero. The same argument implies that all P n are also identically zero. 

Thus, & - & = M(t). 
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Now M(t) is any function with the same zeros (real and complex) of hi. But hi is a bandlimited 
function hi(t) = Y^~%m C n e xnt which is uniquely determined (apart from a multiplicative constant) 
by its 4M real and complex zeros. Thus h\ and hi must coincide identically and the theorem follows. 
The theorem can be generalized allowing for real zeros. 

Finally, a short remark about the multiple and free zero condition. It is rather intuitive that mul- 
tiple and free zeros arc not generic; assume, for instance, that the polynomial ^2—n c n eint has a 
free zero. It is enough to perturb one of the coefficients C n to annihilate the free zero. Similarly, if 
the trigonometric polynomial is a sample function of a random process, the coefficients C n would be 
random numbers, as well as the zeros of the associated polynomial n 2A/ (C — &)• The probability that 
a zero is free (i.e. with & = p e 10 , £ is free iff l p e i0 is also a zero) is usually very low. 

Appendix lb 

Logan's result can be extended to the case of a two-dimensional entire function f(x, y) if it is 
bandpass in x with a band-width strictly less than an octave and band-limited in y . In this case, the 
restriction of / to a one-dimensional line l x in the x, y plane parallel to the x axis will be bandpass 
with less than an octave band-width. Provided the free-zero condition is met, Logan's theorem tells 
us that the zeros of / along l x determine / there up to a multiplicative constant. To determine / 
everywhere up to a multiplicative constant, these parallel slices must be tied together. 

The following lemma shows that Logan's theorem can be invoked for / restricted to a line 4> which 
is not parallel to die X axis, k will intersect all slices l x parallel to the x axis, so determining / up to a 
multiplicative constant on k determines/ up to the same constant along each of the slices 4. 

Lemma 

If /(*> v) is ideally bandpass with band-width stricdy less than an octave in x and band-limited in y 
/*"*N then there is an e > such that / along all slices, l which make an angle < e with the X axis, will 

be bandpass with band-width less than an octave. 
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The support of the Fourier transform of/ is confined in co x to die intervals I\ = (—2a + 6, —a — 
6) and 1% = (a -f- 6, 2o — S) and in w y to the interval / = ( — b, b) for some positive 6, a, and b. 
Observe that the support of die Fourier transform of a slice / through / is confined to the projection 
of the support of the Fourier transform of/ onto the co t axis. The rectangles I\ X J and h X J will 
project into the intervals (—2a, a) and (a, 2a) on L provided that I makes a sufficiently small angle 
witii the x axis. 



m ^, t 



Cs 



^i PNN 44 SPATIOTEMPORAL INTERPOLATION 

Appendix 2 

Wc consider a one dimensional pattern g(x). Arbitrary, non rigid movement of this pattern produces 
a spatiotcmporal image g(x, t). Rigid movement of the same pattern at constant speed gives an image 
g(x, t) — g(x — vt). We state here the classical sampling theorem for the first case and an appropriate 
modification of it for the second case. 

Theorem 1 (classical sampling theorem) 

If a signal g(x, t) is bandlimited in spatial and temporal frequencies it can be recovered exacdy by 
independent interpolation in space and time of its sampled values, provided that the sampling separa- 
tions Af and At are such that Af < 1/2 f% and At < l/2f c T , where f% and f r are the spatial and 
temporal bandwidths. 

Theorem 2 (Crick et al., 1981 ; Faille & Poggio, 1981) 

Assume that the spatiotcmporal signal g(x, t) = g{x — vt). The function g can then be reconstructed 
at the desired resolution from its spatial (temporal) samples. The required sampling density can be 
decreased arbitrarily by knowledge of the velocity v. If only die sign of die velocity is available the 
maximum sampling distance can be twice the classical limit for stationary patterns. 

Comments 

a) The proof of these results can be easily obtained from diagrams in the f x — f t Fourier plane (see 
Fig. 9; Crick etal, 1981). 

b) Theorem 1 requires die function g(x, t) to be bandlimited before sampling takes place, since 
overlap of the frequency lobes as an effect of sampling usually leads to an irretrievable loss of infor- 
mation. This condition is not needed in theorem 2. Overlap never occurs (for infinitely long motion) 
even when the pattern f(x) is not bandlimited in spatial frequency. Any desired part of the original 

/*■%, spectrum can be recovered exactly (without aliasing) by an appropriate interpolation filter. 

c) The spatiotcmporal filter implementing the interpolation depends on v. Assume, for instance, to 
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endow an interpolation scheme with direction selective properties (i.e. to use information about the 
sign oft;): it can be shown that the new spatiotemporal filter is obtained by adding to the spatiotem- 
poral impulse response its Hilbcrt transform with a sign controlled by the sign of v (in the case of 
Fig.8 the Hilbert transform of the spatial point spread function is an odd function). 
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